Clustering Semantically Equivalent Words into Cognate Sets in Multilingual Lists

نویسندگان

  • Bradley Hauer
  • Grzegorz Kondrak
چکیده

Word lists have become available for most of the world’s languages, but only a small fraction of such lists contain cognate information. We present a machine-learning approach that automatically clusters words in multilingual word lists into cognate sets. Our method incorporates a number of diverse word similarity measures and features that encode the degree of affinity between pairs of languages. The output of the classification algorithm is then used to generate cognate groups. The results of the experiments on word lists representing several language families demonstrate the utility of the proposed approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Teaching Vocabulary through Synonymous, Semantically Unrelated, and Hyponym Sets on EFL Learners’ Retention

Many textbooks include semantically related words and sometimes teachers add synonyms, antonyms, etc. to the words in order to present new vocabulary items without questioning the possible effects. This study sought to investigate the effect of teaching vocabulary through synonym, semantically unrelated, and hyponym sets based on Higa’s (1963) proposed continuum. A total of 120 Iranian intermed...

متن کامل

The Impact of Semantic Clustering on Iranian EFL Advanced Learners’ Vocabulary Retention

This study investigated the impact of semantic clustering on Iranian EFL learners’ vocabulary retention at advanced level. Participants were female learners randomly assigned to two groups of 15. Four instruments (TOEFL test; vocabulary pretest; immediate posttest, and delayed recall posttest) were used. The experimental group underwent semantic clustering vocabulary presentation in which the l...

متن کامل

Fast and unsupervised methods for multilingual cognate clustering

In this paper we explore the use of unsupervised methods for detecting cognates in multilingual word lists. We use online EM to train sound segment similarity weights for computing similarity between two words. We tested our online systems on geographically spread sixteen different language groups of the world and show that the Online PMI system (Pointwise Mutual Information) outperforms a HMM ...

متن کامل

Using support vector machines and state-of-the-art algorithms for phonetic alignment to identify cognates in multi-lingual wordlists

Most current approaches in phylogenetic linguistics require as input multilingual word lists partitioned into sets of etymologically related words (cognates). Cognate identification is so far done manually by experts, which is time consuming and as of yet only available for a small number of well-studied language families. Automatizing this step will greatly expand the empirical scope of phylog...

متن کامل

Cognitively Salient Relations for Multilingual Lexicography

Providing sets of semantically related words in the lexical entries of an electronic dictionary should help language learners quickly understand the meaning of the target words. Relational information might also improve memorisation, by allowing the generation of structured vocabulary study lists. However, an open issue is which semantic relations are cognitively most salient, and should theref...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011